All the datasets used in this report are from the Bay of Fundy. There are 35 SectorIDs and 637 SiteIDs in the Bay of Fundy.
| SectorID | SiteID | Latitude | Longitude | log_FC | Date | normalize_tide |
|---|---|---|---|---|---|---|
| 1410 | 13501 | 44.1862 | -66.173 | 0.6419 | 2004-06-16 | 0.8149038 |
| 1410 | 13501 | 44.1862 | -66.173 | 1.9459 | 2004-07-14 | 0.8350515 |
| 1410 | 13501 | 44.1862 | -66.173 | 0.6419 | 2004-08-12 | 0.7380934 |
| 1410 | 13501 | 44.1862 | -66.173 | 0.6931 | 2004-08-26 | 0.6124402 |
| 1410 | 13501 | 44.1862 | -66.173 | 1.6094 | 2004-09-01 | 0.5346154 |
| 1410 | 13501 | 44.1862 | -66.173 | 1.6094 | 2007-06-05 | -0.0413462 |
| SiteID | Tide_Level | Tide_Speed | Week_rescaled | Year_rescaled | Latitude_rescaled |
|---|---|---|---|---|---|
| 13501 | HR | 0.1682692 | 0.50 | 0.2173913 | 0 |
| 13501 | HF | -0.0979381 | 0.58 | 0.2173913 | 0 |
| 13501 | HR | 0.2197802 | 0.68 | 0.2173913 | 0 |
| 13501 | HF | -0.2583732 | 0.72 | 0.2173913 | 0 |
| 13501 | HR | 0.4230769 | 0.72 | 0.2173913 | 0 |
| 13501 | MR | 0.4615617 | 0.48 | 0.3478261 | 0 |
Excluded data containing ‘E’ or NA.
Excluded data before the year 1999.
Applied min-max normalization to the Tide value [-1,1].
Applied a logarithmic transformation (base 10) to the fecal coliform counts.
Tide Speed is the difference between the two tidal hours in the
original data.
This heatmap shows the distribution of observations for different SiteIDs across years from 1999 to 2022 in the Bay of Fundy.
Red cells indicate SiteIDs and years where there are 2 or more observations
Grey cells indicate SiteIDs and years where there are fewer than 2 observations.
By looking at the vertical distribution of red and grey cells, you can observe temporal trends.
The horizontal distribution of colors across SiteIDs can reveal spatial patterns.
Areas with continuous grey cells or large gaps in the heatmap can indicate periods or SiteIDs with missing or no data.
Summary:
Observation Densities: Recent years (especially 2011, 2013-2022) show high observation densities across many SiteIDs.
Consistent Monitoring: Some SiteIDs have been consistently monitored over the years, as indicated by continuous red cells.
Gaps in Data: Early years (1999-2003) show more gaps, indicating
fewer observations or missing data.
Here are the boxplots for Tide level. We have Start_WT and End_WT in the original data. When we create boxplots with tide value and tide state, there are many outliers. After that, we re-leveled the Tide level based on the tide value and followed the Tide calculation instructions that we received via email to re-level the tide level.
The plot shown is a result of Tukey’s Honest Significant Difference (HSD) post hoc test, which was performed after conducting an ANOVA (Analysis of Variance) to compare the mean log-transformed fecal coliform counts (log(FC)) across different Tide_Level groups.
Each line represents the difference in mean log(FC) between two Tide_Level groups.
The labels on the left indicate the pairs being compared (e.g., HR-HF, LF-HF).
The blue bars are the 95% confidence intervals for these differences.
If the confidence interval for a pair does not cross the dashed vertical line at zero, the difference between the means of those two groups is statistically significant, otherwise, not significant.
Left of Zero (Negative Mean Difference): If the entire confidence interval is to the left of the dashed vertical line at zero, it indicates that the first group in the pair has a significantly lower mean log(FC) compared to the second group.
Right of Zero (Positive Mean Difference): If the entire confidence interval is to the right of the dashed vertical line at zero, it indicates that the first group in the pair has a significantly higher mean log(FC) compared to the second group.
Blue color for the Tide Speed >= 0 (Rising Tide).
Red color for the Tide Speed < 0 (Falling Tide).
Summary for Tide Speed >= 0 (Rising Tide):
HR-HF: The confidence interval is to the right of zero, indicating that HR has a significantly higher mean log(FC) than HF.
LF-HF: The confidence interval is to the right of zero, indicating that LF has a significantly higher mean log(FC) than HF.
MF-HF: The confidence interval is to the right of zero, indicating that MF has a significantly higher mean log(FC) than HF.
MR-HF: The confidence interval is to the right of zero, indicating that MR has a significantly higher mean log(FC) than HF.
LF-HR: The confidence interval is to the right of zero, indicating that LF has a significantly higher mean log(FC) than HR.
MF-LF: The confidence interval is to the left of zero, indicating that MF has a significantly lower mean log(FC) than LF.
Summary for Tide Speed < 0 (Falling Tide):
LF-HF: The confidence interval is to the right of zero, indicating that LF has a significantly higher mean log(FC) than HF.
MF-HF: The confidence interval is to the right of zero, indicating that MF has a significantly higher mean log(FC) than HF.
MF-LF: The confidence interval is to the left of zero, indicating
that MF has a significantly lower mean log(FC) than LF.
This plot is a collection of scatter plots, each representing a different SectorID.
Each subplot represents a different SectorID, showing the distribution of log-transformed fecal coliform counts (log(FC)) over the weeks of the year.
The x-axis represents the weeks of the year (from 1 to 52), and the y-axis represents the log-transformed fecal coliform counts (log(FC))
-The density of points in each subplot indicates the number of observations (sample size) for each SectorID across different weeks.
The distribution of points along the x-axis for each SectorID can show temporal patterns in fecal coliform counts.
By comparing subplots, we can see differences and similarities in the distribution of log(FC) values among different SectorIDs.
Some SectorIDs might have more data points, indicating more frequent or consistent sampling, while others have fewer points.
Summary:
Some sectors like SectorID 680, 697, and 706 - 710 show a large number of observations spread throughout the weeks, indicating these sectors have been consistently sampled.
Some sectors like SectorID 667, 675, 686, 689 and 375409 have fewer data points, indicating less frequent sampling.
Seasonal Trends: In some sectors, like SectorID 1411-1451, there are visible clusters of observations in certain weeks, indicating that samples for these sectors were only taken during limited time periods.
This plot is similar to the previous one, but it specifically shows the sample size for each SectorID when the tide speed is greater than or equal to 0, which corresponds to the Rising Tide condition.
This plot is similar to the previous one, but it specifically shows the sample size for each SectorID when the tide speed is less than 0.
This plot is similar to the previous one, but it specifically shows the sample size for each SectorID when the tide speed is less than 0, which corresponds to the Falling Tide condition.
We have fitted two types of models:
Linear Model : Linear Mixed Effect Model
Non-linear Model : Generalized Additive Model (GAM) Model
For the linear model, we used the linear mixed-effects model, which plays an important role in longitudinal data analysis. Longitudinal data involves repeated observations of the same subjects over a period of time. In our study, the data is the longitudinal data on fecal coliform counts from different sites within different sectors over multiple years. Here is our model
\[ \log(\text{FC}_{ijk}) = \beta_0 + \beta_1 \cdot \text{normalize_tide}_{ijk} + u_{ijk} \]
\[ u_{ijk} = b_i + c_{ij} + d_{ijk} + \epsilon_{ijk} \]
Where:
\(\log(\text{FC}_{ijk})\) is the outcome, the logarithm transformation (base 10) of the fecal coliform count for the \(k\)-th observation within the \(j\)-th site within the \(i\)-th sector.
\(\beta_0\) is the overall intercept.
\(\beta_1\) is the coefficient
for the fixed effect of normalize_tide.
\(\text{normalize_tide}_{ijk}\) is the fixed-effect, the normalized tide value for the \(k\)-th observation within the \(j\)-th site within the \(i\)-th sector.
\(u_{ijk}\): The combined random effect term (including sector, site within sector, date, and residual error).
\(b_i\): The random intercept for the \(i\)-th sector, \(b_i \sim N(0, \sigma^2_b)\).
\(c_{ij}\): The random intercept for the \(j\)-th site within the \(i\)-th sector, \(c_{ij} \sim N(0, \sigma^2_c)\).
\(d_{ijk}\): The random effect associated with the Date for the \(k\)-th observation within the \(j\)-th site within the \(i\)-th sector, \(d_{ijk} \sim N(0, \sigma^2_d)\).
\(\epsilon_{ijk}\): The residual error term, \(\epsilon_{ijk} \sim N(0, \sigma^2)\).
This linear mixed-effects model helps us to account for the nested structure of our data and the correlation between repeated measurements within the same site and sector.
Fixed effects represent the overall impact of predictors that are
assumed to be the same across all observations. In this model,
normalize_tide is the fixed effect.
Random effects account for variations at different levels of the data structure that are not explained by the fixed effects. In this model, \(b_i\), \(c_{ij}\), and \(d_{ijk}\) are random effects, representing the variability among sectors, sites, and dates, respectively.
| Dataset | Estimate ß1 | P-Value |
|---|---|---|
| Full data | -0.068 | 2.6e-04 |
| Tide Speed >= 0 | 0.018 | 5.2e-01 |
| Tide Speed < 0 | -0.131 | 4.1e-07 |
Summary:
The normalized tide value has a significant negative effect on fecal coliform counts for the Full Dataset.
When considering only Tide Speed >= 0, the effect of normalized tide value on fecal coliform counts is positive but not statistically significant.
For Tide Speed < 0, the normalized tide value has a significant negative effect on fecal coliform counts.
| Dataset | Estimate ß1 | P-Value |
|---|---|---|
| Summer Full data | -0.119 | 3.6e-07 |
| Summer Tide Speed >= 0 | -0.038 | 2.7e-01 |
| Summer Tide Speed < 0 | -0.186 | 7.1e-09 |
Summary:
For the Summer Dataset, the significance levels for both the coefficient and p-value are better than for the full dataset, indicating a stronger effect in the Summer Dataset.
The normalized tide value has a significant negative effect on fecal coliform counts for the Summer Dataset.
When considering only Tide Speed >= 0, the effect of normalized tide value on fecal coliform counts is negative but not statistically significant.
For Tide Speed < 0, the normalized tide value has a
significant negative effect on fecal coliform counts.
GAMs model are flexible models that allow for non-linear relationships between the predictors and the response variable. They use smooth functions to model these relationships, providing a more accurate fit to the data compared to traditional linear models
The GAM Model is:
\[ \log(\text{FC}_i) = f_1(\text{Week_rescaled}_i) + f_2(\text{Week_rescaled}_i)\cdot \text{normalize_tide}_i + \epsilon_i \]
Where:
\(\log(\text{FC}_i)\) is the logarithmic transformation (base 10) of the fecal coliform count for the \(i\)-th observation.
\(f_1(\text{Week_rescaled}_i)\)
is a smooth function of Week_rescaled.
\(f_2(\text{Week_rescaled}_i) \cdot
\text{normalize_tide}_i\) is a smooth function of
Week_rescaled modified by
normalize_tide.
\(\epsilon_i \sim N(0, \sigma^2)\) is the residual error term.
The smooth function \(f_1(\text{Week_rescaled})\) is constructed using cyclic cubic splines, which are particularly useful for periodic data, such as weekly or seasonal data, ensuring that the smooth function meets end-to-end continuity requirements.
Objective: Illustrates how log(FC) varies with Week_rescaled (on the x-axis) and Year_rescaled (on the y-axis).
The 3D surface highlights the interaction effect between the week and year on fecal coliform counts.
Weekly Patterns: The plot shows variations in fecal coliform levels over the weeks of the year. Peaks and troughs along the Week_rescaled axis indicate seasonal trends or periodic fluctuations in fecal coliform counts.
Yearly Patterns: The plot also reveals changes in fecal coliform levels over the years. Trends along the Year_rescaled axis can indicate whether fecal coliform counts have generally increased, decreased, or remained stable over time.
Summary:
Seasonal Trends
There is a noticeable increase in log(FC) values during the mid-year weeks, indicating usually higher fecal coliform counts in the late summer and fall seasons (weeks 26 to 41). The peak occurs around weeks 35 to 36.
During late winter and early spring (weeks 6 to 16), there seem to be relatively lower fecal coliform counts, as indicated by the blue areas.
Long-term Trends Over 20 Years
Objective: Illustrates how log(FC) varies with Week_rescaled (on the x-axis) and Latitude_rescaled (on the y-axis).
The 3D surface highlights the interaction effect between the week and latitude on fecal coliform counts.
Latitude Patterns: The plot reveals how fecal coliform levels vary across different latitudes. Trends along the Latitude_rescaled axis can indicate whether certain latitudes consistently have higher or lower fecal coliform counts.
Summary:
A common trend is observed around week 21, with a noticeable increase in log(FC) values for most latitudes, peaking around weeks 35 to 36, and then decreasing.
The highest log(FC) values (yellow and red regions) are observed around the latitude range of 44.19 to 44.48, and 45.22, suggesting that these latitudes might experience higher fecal coliform levels.
The plots generated by the GAMs modelprovide insight into the relationships between the predictor variables ‘normalize_tide’ and the response variable log(FC).
Each plot represents a smooth term in the GAM model. The solid line represents the estimated smooth effect of the predictor variable, while the dashed lines represent the 95% confidence intervals.
The y-axis shows the effect size, and the x-axis shows the predictor variable (Week_rescaled).
The red horizontal line at y = 0 serves as a reference for no effect.
Summary:
For most of the weeks, normalize_tide shows a negative relationship with log(FC), as evidenced by the smooth effect being below the red reference line for all three dataset.
These trends highlight that the effect of normalize_tide on fecal coliform counts varies throughout the year, showing both negative and positive relationships depending on the week and tide speed conditions.
Summary: